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Abstract 



A more sums than differences (MSTD) set is a finite subset S of the integers such | S + S | > 
I I S — S I . We show that the probability that a uniform random subset of {0, 1, . . . , n} is an MSTD 

set approaches some limit p > 4.28 x 10""*. This improves the previous result of Martin and 
1^ ' O'Bryant that there is a lower limit of at least 2 x 10~^. Monte Carlo experiments suggest 

' that p ~ 4.5 X 10~*. We present a deterministic algorithm that can compute p up to arbitrary 

precision. 

We also describe the structure of a random MSTD set S C {0,1, ... ,n}. We formalize the in- 
I tuition that fringe elements are most significant, while middle elements are nearly unrestricted. 

' For instance, the probability that any "middle" element is in S approaches 1 /2 as n — > oo, con- 

' firming a conjecture of Miller, Orosz, and Scheinerman. 

■ In general, our results work for any specification on the number of missing sums and the 
number of missing differences of S, with MSTD sets being a special case. 

^ ! 1 Introduction 

> . 

^ ' A more sums than differences (MSTD) set is a finite set S of integers with |S + S| > |S — S|, where 

' 

■ the sum set S + S and the difference set S — S are defined as 

S + S = {si + S2 : Si,S2 e S}, 
S — S = {si — S2 : Si,S2 e S}. 

Since addition is commutative while subtraction is not, two distinct integers Si and $2 generate 
one sum but two differences. This suggests that S + S should "usually" be smaller than S — S. 
Thus we expect MSTD sets to be rare. 

The first example of an MSTD was found by Conway in the 1960's: {0,2,3,4,7,11,12,14}. 
The name MSTD was later given by Nathanson [81. MSTD sets have recently become a popular 
research topic d 121 El 13 IS [M SI- For older papers see 1311 13 [lOHlIl mill- We refer the reader 
to KZUHl for the history of the problem. 
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In this paper, we address the following two questions regarding MSTD sets and their general- 
izations. 



1. What is the probability that a random subsets of {0, 1, ... , n} is an MSTD set? 

2. What is the structure of a typical random MSTD subset of {0, 1, . . . , n}? 

The first question was raised by Martin and O'Bryant ISj. Let pn be the probability that a uniformly 
chosen random subset of {0, 1, ... , n} is an MSTD set. In f5\ it was shown that p„ > 2 x 10^^ for 
aU n > 14. This is a surprising result since it is contrary to our original intuition that MSTD sets 
should be rare. It is true that p„ = for n < 13, and pn is then monotonically increasing at least 
for n < 26. From this data, Martin and O'Bryant conjectured that pn approaches some limit and 
then they estimated this limit using Monte Carlo experiments. 

Conjecture 1.1 (Martin and O'Bryant ||3). As n ^ oo, the proportion p„ of MSTD sets converges to a 
limit about 4.5 x 10^^. 

Previously it was not known whether p,, converges. In this paper, we show that pn indeed 
approaches some limit p. We also give a deterministic algorithm which can, in principle, compute 
arbitrarily good lower and upper bounds for p. 

Theorem 1.2. As n —> oo, the proportion p„ of MSTD sets converges to a limit p > 4.28 x 10^^. 

Our numerical result is a significant improvement over Martin and O'Bryant's 2 x 10^'^. Un- 
fortunately, limits of computation prevent us from giving a good upper bound. However, if we 
were to have unlimited computing power, then our method could give provable bounds for p up 
to any desired precision. 

Our proof, like that of Martin and O'Bryant, is non-constructive. As for constructive results, 
the densest families of MSTDs subsets of {0, 1, 2, . . . , «} constructed so far are due to Miller, Orosz, 
and Scheinerman fSf (with density Q(l/n^)) and the author fT$\ (with density 0(1 /n)). No ex- 
plicit explicit construction with 0(1) density is known. 

Our method for proving Theorem [L2] can easily be adapted to answer other similar questions 
such as: 

1. What is the probability that a uniformly random subset S C {0,1, ... ,n} has more differ- 
ences than sums, i.e., |S + S| < |S — S|? 

2. What is the probability that a uniformly random subset S C {0,1, ... ,n} has equal number 
of differences and sums, i.e., |S + S| = |S — S|? 

3. What is the probability that a uniformly random subset S C {0, 1, . . . , n} is missing exactly 
s sums and d differences, i.e., |S + S| = 2n + 1 — s, \S — S\ = In + 1 — d, where s and d are 
fixed? 

4. What is the probability that a uniformly random subset S C {0, 1, . . . ,n} has exactly x more 
sums than differences, i.e. |S + S| — IS — S|=x, where x is fixed? 
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As we will show, in each case, as n —> oo, each sequence of probabilities approaches some limit. 
Furthermore, we have a deterministic algorithm that can give arbitrarily good provable bounds 
for the limit. 

Our general result works for any characterization on the number of missing sums and and the 
number of missing differences of S C {0, 1, . . . , n}, by which we mean the pair 

A(S) = A„(S) = (2n + 1 - |S + S| ,2n + 1 - |S - S|) . 

Let A denote some (possibly infinite) subset of Z>o x Z>o. We would like to study the collection 
of subsets S C {0,1, ... ,n} such that A(S) € A. For instance, A = {{s,d) : s < d} corresponds 
to MSTD sets; the one-element set A = {{s,d)} corresponds to question 3 above; A = {{s,d) : 
d — s = x} corresponds to Question 4 above. 
Let 

p^ = 2-"-'\{SQ{0,l,...,n}:A{S)eA}\. 

This is the probability that a uniformly random subset of {0,1, ... ,n} characterized by A. We 
prove the following generalization of Theorem 11.21 When A is the one-element set {{s,d)}, we 
abuse notation by writing p^''^ to mean p^. 

Theorem 1.3. For any A C Z>o x Z>o, the limit 

= lim pil 

exists. It is positive as long as A contains as least one element (s, d) where d is even. Furthermore, 

{s,d)GA 

Theorem 11.31 resolves Conjectures 2 and 19 of Martin and O'Bryant [Si. Specifically, they con- 
jectured that the probabilities in questions 1-3 above all have limits as n — > oo, and also that 
Ils,dP^''^ = 1' fhe latter follows from Theorem 1 1 .31 with A = Z>o x Z>o. Hegarty [ll showed that, 
for d even, the limit p^'"^ is positive provided that it exists. However, it was previous unknown 
whether any of these limits exists. 

Our next result provides some insight into the structure of a random subset S C {0,1, ... ,n} 
conditioned on A(S) G A. We argue that, except for the fringe elements of S (i.e., the numbers close 
to or n), the middle elements are nearly unrestricted and independent from the fringe choices. 
The precise statement is found in Theorem 15.11 This intuition was key to Martin and O'Bryant's 
proof [5J that p„ is bounded below. It was also used by Miller, Orosz, and Scherinerman [61 to 
construct a family of MSTD sets. However, previous work only applied the intuition to a relatively 
small proportion of all MSTD subsets. There has been no descriptions on what "most" MSTD sets 
look like. Our result is the first rigorous formulation of this common intuition. The techniques 
used in this paper have also inspired a new approach to a different problem on counting numerical 
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semigroups of a given genus lfT5| . 

For a uniformly random subset S C {0, 1, . . . ,n} conditioned on A(S) G A, our results imply 
that the middle segment of S is close to being unrestricted. For instance, the probability that any 
"middle" element is in S approaches 1/2 as n — > oo, thereby confirming (and generalizing) a 
conjecture of Miller, Orosz, and Scheinerman [6J. Also, the expectation and variance of the size 
of S are asymptotically the same as that of the binomial distribution on n + 1 elements. The size 
distribution of S also follow a central limit theorem. 

This paper is organized as follows. We start by focusing exclusively on the MSTD problem. 
In Section |2] we show that the limit p in Theorem 11.21 exists. In Section |3] we elaborate on issues 
pertaining to computing lower and upper bounds for p. Next we move to the general case of 
subsets S satisfying A(S) G A. In Section H] we discuss how our methods for MSTD sets can be 
modified to prove Theorem 11.31 In Section |5] we study the structure of a random set S satisfying 
A(S) G A. Finally, in Section[6]we offer some concluding remarks. 

2 The limiting proportion of MSTD sets 

In this section, we show that proportion pn of MSTD sets converges to a limit. Although the proof 
contains a lot of the ingredients used in computing the limit, we defer to Section |3] any details that 
are only relevant to the computation. 

Let us give some intuition for our proof. Let S be a "typical" subset of {0, 1, . . .,n}. As 
observed by Martin and O'Bryant |i5J, except for elements near the "fringe," most elements of 
{0, 1, 2, . . . , 2n} can be represented as a sum of two elements of S in a large number of ways. Con- 
sequently, these elements will "typically" be in the sum set. As Martin and O'Bryant put it, "if we 
choose the 'fringe' of S cleverly, the middle of S will be become largely irrelevant." 

The authors then proceed by manually fixing a particular choice of fringe for S, and thereby 
obtaining their lower bound for pn ■ Unfortunately, fringe-fixing leads to very suboptimal lower 
bounds, since "most" MSTD sets do not have a particular fixed fringe profile. 

Our idea is to let the fringe vary. For each particular fringe profile, we compute the proportion 
of subsets S with the given fringe profile and the additional property that all the middle sums, 
namely those that are not completely controlled by the fringe, are in S + S. Then we can obtain the 
total proportion of MSTD subsets by summing over all candidate fringe profiles. Doing this leaves 
out those potential MSTD sets with some missing middle sum. Fortunately, as we will show, sets 
missing middle sums occupy a very small proportion of all subsets. 

We begin by restricting ourselves to subsets S C {0,1, ... ,n} with 0, n G S, and then relax this 
constraint in Section [231 

2.1 MSTD fringe pairs 

From now on, we use [a, b] to denote the set {a,a + 1, . . . ,b} if a < b, or the empty set otherwise. 
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Let S C [0, n]. When searching for fringe profiles candidates for S, we want the fringe alone to 
already generate more sums than differences. More precisely, suppose we fix S H [0, A;] = A and 
(n - S) n [0,k] = B. Then (S + S) n [0,k] is completely controlled by A and (S + S) n [2n - k,2n] 
is completely controlled by B. Similarly, {S — S) (1 {±[n — k,n]) is completely controlled by A and 
B. Suppose that we can choose the middle segment of S, i.e., S Ci [k + l,n — k — 1], so that every 
element of [k + l,2n — k — 1] appears in S + S, then it would follow that S is MSTD. So we would 
like to look for fringe profiles {A, B) with the above properties. This is formalized in the following 
set of definitions. See Figure [T] for a visual illustration. 

Ok n-k n 



Ok 2n-k 2n 

s+s r I f 1 

{A + A)r\[0,k] 2n- {B + B)r][0,k] 

— —11+k n — k n 

S-S i I 



-n+{A + B)r\[0,k] n- {A + B)r\[0,k] 

Figure 1: The shaded areas are regions in S, S + S, and S — S are completely controlled by the 
fringe {A,B;k) of S. 



Definition 2.1. A MSTD fringe pair of order A: is a pair {A, B) (also denoted {A, B; k) to indicate the 
order), where A and B are both subsets of [0, k], with G A and G B, and satisfying 

|(A + A) n [0,fc]| + |(B + B) n [0,A]| > 2 |(A + B) n [0,A]| . 

In Section|4]we consider a variation of fringe pairs to deal with generalizations of MSTD sets. 
We impose the following partial order on the set of all MSTD fringe pairs: (A, B;/c) > (A', B';/c') 
iik > k' and 

A' = An [0,/:'], B' = Bn[0,k'], [k' + l,k] Q A + A, [k' + l,k] Q B + B. (1) 

Definition 2.2. A minimal MSTD fringe pair is a MSTD fringe pair {A,B;k) for which there does 
not exist another MSTD fringe pair (A', B';k') with (A, B;k) > (A', B';k'). 

It is not hard to show that, to determine whether an MSTD fringe pair is minimal, it suffices to 
check ([T]) for k' = k — 1. We use this fact in the computer search for minimal MSTD fringe pairs. 

Example 2.3. There are no MSTD fringe pairs of order less than 6. The minimal MSTD fringe pairs 
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of order 6 are 

A B 



{0} {0,1,3} 6 

{0} {0,2,3} 6 

{0,1,3} {0,1,2,4} 6 

{0,2,3} {0,1,2,5} 6 



as well as the four others obtained by switching A and B. The minimal MSTD fringe pairs of order 
7 are 

A B k 



{0} {0,1,3} 7 

{0} {0,2,3} 7 

{0} {0,1,3,4} 7 

{0} {0,1,2,5} 7 

{0,1,3,4} {0,1,2,5} 7 



as well as the five others obtained by switching A and B. There are ten non-minimal MSTD fringe 
pairs of order 7. They are 

A B k 



{0,1,2,5} {0,2,3,7} 7 

{0,7} {0,1,3,7} 7 

{0,7} {0,2,3,7} 7 

{0,1,3,7} {0,1,2,4,7} 7 

{0,2,3,7} {0,1,2,5,7} 7 

as well as the five others obtained by switching A and B. 

Definition 2.4. Let S C [0, n] . We say that S is a rich MSTD set with MSTD fringe pair ( A, B; A:) if 

2fc<n, Sn[0,k]=A, {n- S)n[0,k] = B, and [A: + l,2n - A: - 1] C S + S. 

The order of the rich MSTD set S is the smallest possible value of k for which there exists such an 
MSTD fringe pair (A, B;k). 

As expected, rich MSTD sets are MSTD, as we shall prove in a moment. We choose the name 
rich because S is rich in sums in the middle. Also, as we will see, they represent a rich collection 
of MSTD sets. 

Next we prove some simple facts about rich MSTD sets and its MSTD fringe pairs. The goal 
is to show that we can count rich MSTD sets by going through the list of minimal MSTD fringe 
pairs. The proofs are mostly straightforward and they can be skipped if desired. 

Lemma 2.5. A rich MSTD set is an MSTD set. 

Proof. Let S C [0,n] be a rich MSTD set with MSTD fringe pair {A,B;k). We need to show that 
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|S + S| > |S — S|.It suffices to show that 

\{S + S)n{[0,k][j[2n-k,2n])\ > \{S - S) D {[-n, -n + k]U [n - k,n])\ , (2) 
and \{S + S)n[k + l,2n-k-l]\>\{S-S)n[-n + k + l,n-k-l]\. (3) 

The inequality (jS)) immediately follows from the requirement [k + l,2n — k — 1] C S + S. For (O, 
we note that 

(s + s)n[o,;c] = iA + A)n[o,k], 

(S + S) n [2n - k,2n] = ((n - B) + (n - B)) n [2n - k,2n] = 2n - (B + B) n [0,k], 
(S - S) n [-n,-n + k] = {A - (n - B)) n [-n,-n + k] = (A + B) n [Q,k] - n, 
{S-S)r\\n-k,n\ = {{n - B) - A) n[n - k,n] = n - {A + B) n [0,k]. 

And hence the sizes of the above four sets are \ {A + A) f] [0,k] \ , \ {B + B) f] [0, k]\ ,\{A + B) n[0,k]\, 
and |(A + B) n [0, /:] |, respectively. Then Q follows from {A, B; k) being an MSTD fringe pair. □ 

A rich MSTD set may have many choices for its fringe pair. The following lemma shows that 
the set of MSTD fringe pairs of a particular rich MSTD set forms a chain in the partial order. 

Lemma 2.6. Let S C [0, n] be a rich MSTD set. Let {A, B; k) and {A', B'; k') be two MSTD fringe pairs 
ofS. Ifk = k', then {A,B;k) = {A',B';k'). Ifk > k' , then {A,B;k) > {A',B'}k'). 

Proof. lfk = A:',then A = A' = S n [0,k] and B = B' = (n - S) n [0,k]. So {A,B;k) = {A',B';k'). 

lfk> k', then A' = S n [0,k'] = An [0,k'], B' = {n - S) n [0,k'] = B n [0,k']. Since S is rich 
with fringe pair {A', B',k'), we see that [k' + l,2n — k' — 1] C S + S. The sum in [0,k] can only 
come from a sum of two elements in [0, k], so that [A:' + 1, A:] C A + A. Similarly, [A:' + 1, A] C B + B. 
Hence {A,B;k) > {A',B';k'). □ 

Thus, for a rich MSTD set of order k, we can speak of its minimal MSTD fringe pair, which 
necessarily has order k. 

Lemma 2.7. Let S C [0, n] be a rich MSTD set. Let [A,B;k) be the minimal MSTD fringe pair of a 
rich MSTD set S. Then {A, B;k) is minimal in the partial ordering of all MSTD fringe pairs. Also, for 
every k < k' < n/2, {A',B';k') is also an MSTD fringe pair of S, where A' = S r\ [0, A] and B' = 
(n — S) n [0, k\, and every MSTD fringe pairs ofS have this form. 

Proof. Suppose that {A,B;k) is not a minimal MSTD fringe pair, so that we have {A',B';k') < 
{A,B)k). Then A' = An [0,A'] = S n [0,k'] and B' = B n [0,A'] = (n - S) n [0,k']. Also, [k' + 1,A] 
is contained in A + A and B + B, and [k + l,2n — k — 1] C S + S (since S is rich of order k), so that 
[k' + \,2n — V — \\ C S + S. Hence (A', B'; V) is also a fringe pair of S, thereby contradicting the 
the choice of ( A, B; A) as the minimal MSTD fringe pair of a rich MSTD set S. 
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For the second claim, where k < k', we see that [k + 1, k'] is contained in A' + A' and B' + B' as 
[k + l,2n — k — 1] C S + S. Since {A, B; k) is an MSTD fringe pair, we have 

I {A' + A') n [a /:'] I + |(B' + B') n [0, /c] I = |( A + A) n [0, /c] I + I (B + B) n [0, k] I + l{k' - k) 

>2|(A + B)n[0,/c]|+2(/c'-/c) 
> 2|(A' + B') n [Q,k']\ 

Hence {A' , B'; k') is also an MSTD fringe pair. The rest of the lemma is clear. □ 
Therefore, we can count rich MSTD sets by their minimal MSTD fringe pairs. 

2.2 Semi-rich sets 

We are interested in counting the number of rich MSTD sets with a given MSTD fringe pair. It 
turns out that we can divide this problem into two halves: the front half and back half. In this 
section we show how to compute the relevant limiting proportions for each half. In the next 
section we show how to put the two halves together. 

Definition 2.8. We say that T C [0, n], where n>k, is a k-semi-rich set if [/c + 1, n] C T + T. We say 
that T has prefix {A;k) where A = Tn[0,k]. 

For n > A: and A C [0, k] (with G A), let 

an{A;k) =2-"\{T Q [0,n] : Tn[0,k] =A,[k + l,n] C T + T}|. (4) 

In other words, (7n(A; k) is the probability that a uniformly random subset S C [0, n] (conditioned 
on G S) is fc-semi-rich with prefix {A;k). In this section, we show that 0'„{A;k) converges to a 
limit and give a formula for computing this limit. 

Proposition 2.9. For every A C [0, k] with € A, the limit 

a{A;k) = lim crn(A;k) 

exists and is positive. 

Proof. We compute the size of the collection in ((4]) by considering the complement. We know that 
(r„{A; k)2" = T'^ - | {T C [0, n] : T n [0, fc] = A, [fc + 1, n] ^ T + T} | . (5) 
Observe that the set on the RHS can be partitioned by the smallest element of \k + 1, n\ not in 
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T + T, that is. 



{TC [0,n] :Tn[0,/:] =A,[k + l,n] <^T + T} 

= y{TC [0,n] : rn[0,fc] = A, [fc + 1,; - 1] C T+T,] ^ T + T} 

j>k 

where tt) denotes disjoint union. We introduce the following quantity for / > k: 

Gj{A;k) = |{T C [0,;] : T n[0,k] = A,[k + l,i - 1] Q T + T,] ^T + T}\. 

Then, for k < j < n, 

|{TC [0,n] : rn[0,/c] = A, [k + l,]-!] C T + T,i^T + T}\ 

= 2"-'-\{TQ [0,/] : Tn[0,fc] = A, + C T + T,/^ T + T}| 

= G,(A;fc)2"-^, 

since T fl [/ ' + 1, «] can be chosen arbitrarily. It follows from ^ that 

cr„{A;k)2" =2"-'- Gj{A;k)2''-' . 

i=k+l 

So 

an{A;k)=2-''- GjiA;k)2-j, 

j=k+l 

and hence 

CO 

a{A;k) = lim an{A;k) = 2-^ - V GAA;k)2-K (6) 

In particular, the limit exists since the quantities Gj{A;k) and cr„{A;k) are all non-negative. The 
argument for cr{A;k) > is very similar to the arguments in [5j, so we only sketch the idea. 
Basically, if we choose a sufficiently large £ (depending on k) and require that [k + 1,1] C T, 
and then choose T n + 1, n] randomly, then there is a positive lower bounded probability that 
[/: + 1, n] C T + T, thereby making T semi-rich (the idea is very similar to Lemma [2. 121 ). □ 

2.3 Rich MSTD sets with a given MSTD fringe pair 

Fix an MSTD fringe pair [A,B;k). As n — > oo, what proportion of the subsets of [0, «] are rich 
MSTD sets with MSTD fringe pair [A,B;k)l In this section, we show that the answer is simply the 
product of the proportions of fc-semi-rich sets with prefix {A;k) and {B;k) respectively. 

The intuition here is that, for large n and a uniform random subset S C [0, n], with very high 
probability every element in [n/2, 3n/2] appears in the sum set S + S. So we are mostly concerned 
with ensuring that each half of S is semi-rich. 
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For an MSTD fringe pair {A,B)k), and an integer n > 2k, let 

Pn{A,B}k) = 2""+^ \{S C [0,n] ■.Sn[0,k]= A,{n-S) n [0,k] = B,[k + l,2n - k - 1] Q S + S}\ . 

(7) 

In other words, pn{A,B;k) is the probability that a uniformly chosen random subset S C [0,n] 
(conditioned on 0, n G S) is a rich MSTD set with MSTD fringe pair {A,B;k). The following 
proposition formalizes the above intuition. 

Proposition 2.10. As n — )■ oo, p„ ( A, B; /c) approaches a limit p{A,B;k), and 

p{A,B}k) = a{A;k)a{B;k). 

Proof. In this proof, assume that n is sufficiently large. Let m = [n /2j . If a subset S C [0, n] is a rich 
MSTD subset with MSTD fringe pair {A,B;k), then it follows that S fl [0, m] is a /c-semi-rich subset 
of [0, m] with prefix {A; k), and (n — S) fl [0, n — m — 1] is a fc-semi-rich subset of [0, n — m — 1] with 
prefix {B;k). Thus we have 

p„(A,B;fc)2"-i < (T,n{A;k)2^ ■ an-m-i{B;k)r-"'-' = am{A;k)(Tn-„r-i{B;k)2"-\ (8) 

The difference (7'm(^;fc)c^n-m-i(i^;^)2"^^ — p„{A,B;k)2"^^ counts the collection of subsets of [0, n] 
which, among other things, have the property that some element in [m + 1, n + m] is missing from 
S + S. It is easy to see that the number of subsets S C [0,n] satisfying ^ S + S is precisely 
3[(/'+i)/2j . 2"-/' where / = / if < / < n and / = 2n - j if n < j < 2n. So, if ; G [m + 1, n + m], 
then the number of subsets S C [0, n] with / ^ S + S is at most 3"'/22"-'" < 3«/42"/2+i (recall that 
m = [n/2j). Therefore, 

crm{A;k)a„-m~iiB;k)2"-^ - Pn{A,B;k)2"-^ < \{S Q [0,n] : [m + l,n + m] ^ S + S}\ 

< n ■ 3"/42"/2+i. (9) 

Combining ^ and (H)) we obtain 

c7-„,(A;/:)(7„_„,_i(B;/:) -n ■3"/42-n/2+2 < p^[A,B;k) < am{A;k)a„_„,_,{B;k). 

Letting n — >• oo gives 

lim p„{A,B;k) = lim cri,,/2| (A;fc)cr„_|„/2i i(B;/:) = c7-(A;/c)c7-(B;fc), 
thereby proving the lemma. □ 
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2.4 Almost all MSTD sets are rich 



Previously we considered the proportion of rich MSTD sets with a particular MSTD fringe pair. 
By summing over all minimal MSTD fringe pairs, we obtain the proportion of rich MSTD sets. 
In this section, we show that, in some sense, almost all MSTD sets are rich, so that the limiting 
proportion of MSTD sets equals the limiting proportion of rich MSTD sets. 

The intuition, as before, is that there is a diminishingly small probability that any "middle" 
sum or difference is missing. We can quantify this observation through the following two lemmas. 

Lemma 2.11. Let Shea uniform random subset of [0, n] containing and n. 

(a) Ifs G [l,n - 1], then 

And ifs e [n + l,2n - 1], then F{s ^ S + S} = P{2n -s^S + S}. 

(b) Ifd is an integer with n/2 < d < n, then 

n-d-l 



F{d^S-S} = l(^) 

IfO <d < n/2, then 

P{d^S-S}< (^^) 
Finally, F{d ^ S - S} = F{-d ^ S - S}. 



(«-l)/3 



We omit the easy proof of Lemma |2. 1 1 1 since very similar results can be found in O Sec. 2]. We 
also used similar ideas in the proof of Proposition |2.10[ 

Lemma 2.12. Let n and k be positive integers with n > 2k. Let S be a uniform random subset of [0, n] 
containing and n. Then 



w[[k + l,2n-k-l]<lS + s]<^^^^, 



and 

w[\-n + k + l,n-k-l]%S-S]<2(-\ +(n + l)f-J 

Proof. In each case, apply the union bound, use Lemma |2[TTJ and then sum a geometric series. □ 

We also state a variation Lemma [2.121 where we drop the restriction that S contains and n. 
The proof is very similar so we omit it. This lemma will be used in later sections. 
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Lemma 2.13. Let n and k be positive integers with n > 2k. Let S be a uniform random subset of [0, n]. 
Then 

3(3/4)'^/2 



p{[fc + l,2n-fc-l] 2 S + S} < 



l-^/3 

and 

P 



3\*:+2 /2N(n-l)/3 



The take-away point from the above two lemmas is that by forcing k to be large, we can make 
the probability that any middle sum or difference is missing to be negligible. In other words. 



_limlimsupP|[fc + l,2n-fc-l] g S + s| = 0, 
_limlimsupP|[-n + fc + l,n-fc-l] g S-S| =0. 



Now we state the result that formalizes the statement that "almost all MSTD sets are rich." For 
now, we restrict ourselves to MSTD sets S C [0, n] containing and n. Let 

= 2-"+^ |{S C [0,n] : 0,n e S, and S is MSTD}| . 

We put the asterisk in the subscript to indicate that 0, n G S because we need to reserve the super- 
script space for later. 

Proposition 2.14. As n — oo, converges to a limit p^, and 

{A,B;k) 

where the sum is taken over all minimal MSTD fringe pairs {A,B;k). 

Proof. Fix k a positive integer. We start by considering only MSTD fringe pairs of order at most k. 
In the last step of the proof we let — ^ oo. 

Assume that n is sufficiently large. If S is a uniform random subset of [0, n] containing and n, 
then p^.„ is the probability that S is MSTD. Since rich MSTD sets of order at most k form a subset 
of all MSTD sets, we have 

^ Pn{A,B;k) < p,n. (10) 

(A,B;k) 
k<k 

Unless otherwise specified, such sums are always assumed to be taken over minimal MSTD fringe 
pairs. Note that the sum has finitely many terms. 

Let S C [0,n] be an MSTD set containing and n. Let A = S n [0,k] and B = {n - S) D [0,k] 
be the fringe sets as usual. Suppose that S is not a rich MSTD set of order at most k (meaning that 
either S is not rich, or S is rich with order greater than k). There are two possibilities 
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Case 1. {A,B;k) is not an MSTD fringe pair. Then 



{A + A)n[0,k] + {B + B)n[0,k] < {A + B)r\[0,k] 



Since S is an MSTD set, S — S must be missing some difference in [—n + k + l,n — k + 1] 
(c.f. proof of Lemma IZS)) . 

Case 2. {A, B;k) is an MSTD fringe pair, but S C [0, n] is not a rich MSTD set of k, i.e., S + S is 
missing some sum in [k + l,2n — k — 1]. 

In both cases, S is missing a middle sum or a middle difference. By Lemma [2. 121 we have 



0<|0*n- E Pn{A,'B;k) 

{A,B;k) 
k<k 



< p|[/: + l,2n -A:- 1] 2 S + S} +P|[-n + fc+l,n-A:-l] g S-S}. 



(3/4)^/2 ^{3^ , ,,/3 



(n-l)/3 



Let n — )^ 00 and we get 



^ jO(A, < limintp^n < limsup|0*„ < 

{Am 

A:<lc 



1 _ \/3 
^ 2 



Let /c — )■ 00 and we get 



|0* = lim |0*„ = Y p{A,B;k). 

(A,B;Jc) 



□ 



2.5 The proportion of MSTD sets 

In this section we remove the restriction that 0, n G S. Recall that pn is the probability that a 
uniform random subset of [0, n] is an MSTD set. 

Lemma 2.15. lim p„ = p*. 

(H-oo 

Proof. Fix £ > 0. Choose an N so that ||0*m — < e for all m > N/3. Let S be a random 
subset of [0, n], where n > N. Let E denote the event that minS < n/3 and maxS > 2n/3. So 
P(E) = (1 - 2- L"/3J +1)2 jf £ occurs, then the probability that S is MSTD is e-close to p*- It follows 
that 



\ _ 2H"/3J+i)' ^p^_e)<p„<(l- 2-L"/3J+i)' + ^) + 1 _ _ 2-L"/3J+i)' . 



13 



for n > N. Let n —> co and we get 

— £ < lim inf p„ < lim sup Pn < P* + £■ 

Since e can be made arbitrarily small, we have 

lim p„ = p^:. □ 

n^oo 

Combining Propositions I2.10[ 12.141 and Lemma 12.151 we obtain the following formula for the 
density of MSTD sets. 

Proposition 2.16. The density of MSTD sets satisfy 

p = lim pn= V p{A,B;k)= V cr{A,k)cT{B;k) 

n — >'Co ' ^ ' ^ 

{A,B;k) {A,B;k) 

where the sum is taken over all minimal MSTD fringe pairs {A, B; k). 

In particular, we have proven the existence of the limit in Theorem 11.21 Proposition 12 . 1 61 also 
gives the formula that we will use to compute p. 

3 Computing the limit 

In this section we explain how to compute lower and upper bounds for p. Our method could, in 
principle, be used to derive bounds of arbitrary precision, although in practice the computation 
time increases exponentially with desired precision. We start with a description of the method to 
compute the estimate to p. Our numerical results can be found at the end of this section. 

Our computation consists of the following steps. The functions cr and p were defined in Sec- 
tions |2]2] and |23l respectively. 

1. Fix a k. Find all minimal MSTD fringe pairs of order up to k. 

2. For each [A, B; k) found in step 1, compute lower and upper bounds for cr^A; k) and 0'{B;k) . 

3. Add up the lower and upper bounds for p{A, B; k) = (r{A; k)cr{B; k) for all {A, B;k) found in 
step 1. 

The variables k, J, and h]^ are all computational parameters, viewed as inputs to the compu- 
tation. Each variable represents the extent of some complete search. In general, larger values of 
these parameters give better numerical results but also increases running time. 
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3.1 Generating minimal MSTD fringe pairs 

All the minimal MSTD fringe pairs of order k can generated by a complete search through all pairs 
subsets of [0, k], for each k up to k. That is, we generate a list of all pairs of subsets A,B Q [0, k] 
satisfying 

• OE A,OeB; 

• \iA + A)n[0,k]\ + \{B + B)n[0,k]\ >2|(A + B)n[0,fc]|; 

• The following statements are not all true: k G A + A, k G B + B, \{A + A) (1 [0,k - 1] \ + 
|(B + B)n[0,fc-l]| >2\{A + B)n[0,k-1]\. 

The first two items correspond to {A,B;k) being an MSTD fringe pair, while the third item corre- 
sponds to minimality. 

3.2 Estimating (7(A;fc) 

Recall that cr{A;k) is the density of semi-rich sets with prefix {A;k). The methods used here to 
compute lower and upper bounds to (j{A; k) build on the results developed earlier in Section |Z2l 
The key formula is which we reproduce here for convenience: 

oo 

(T{A;k) = l-"" - Gj{A;k)2-i (12) 

j=k+l 

where 

Gj{A;k) = \{TQ [0,;] : T n [0,/c] = A, [fc + 1,; - 1] C T + T,; ^ T + T}| . 
The computation consists of the following steps. Here J is a computational parameter. 

1. Compute the terms Gj{A} k) in ([T2|) for all / satisfying k < j <] to obtain an upper bound to 
cr[A; k) by using a partial sum. 

2. Upper bound the trailing sum Gj{A; k)2^> in ((T2]l to obtain a lower bound to a{A;k). 
In this section, we describe how to produce two numbers cr_[A;k) and cr^{A;k) such that 

o--{A;k) < cr{A;k) < cr+{A;k). 

By increasing our computational parameters, we could, in principle, make the two estimates 
CT- {A; k) and cr_|_ [A; k) arbitrarily close to the true value (j{A; k). Unfortunately, the cost of compu- 
tation increases prohibitly with desired precision level. 
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3.2.1 Upper estimate of a{A} k) 

Each individual term Gj{A;k) can be computed by a complete search. For each minimal MSTD 
fringe pair {A;k), we shall compute Gj{A;k) for all / satisfying k < j < ]. Our upper bound to 
cr{A} k) is then given by 

a+{A;k)=l-^- Gj{A;k)2-' (13) 

j=k+l 

3.2.2 Lower estimate of o-{A; k) 

To determine a lower estimate of cr{A; k), we need an effective upper bound for the following the 
trailing terms in ((12)) : 

J2Gj{A}k)2'i. (14) 

;>7 

In computing an upper bound to ([Mi l, we do not explicit compute the exact values of any addi- 
tional Gj{A; k) terms. We obtain an upper bound through the following series of lemmas: 

Lemma 3.1. Let A C [0,k]. If 2k < j, then 

Gj{A;k) < 2*^+1-1-^1 ■ 3L0-2fc-i)/2J (15) 
and ifk < j < 2k, then Gj{A;k) = z/;' G A + A, and otherwise 

Gy(A;fc) <2^'-'^-|^"[0'^-'^-i]l. (16) 
Proof. In both cases, the bound simply uses the fact that 

Gj{A;k) < |{T C [0,/] ■.Tn[0,k]= A,; ^T + T}\. (17) 

It can be easily checked that the RHS of the ([17)) equals to the RHS expression in ([15)) and ([16)) in 
the respective cases. □ 

Lemma 3.2. Let A C [0,k] and 2k < £. Then 

~ . f 2*^+2-1^1-^. 3-*^+^ if lis odd, 

VGf{A;k)2-^<{ ^ 

M |^5.2«:+2-|A|-£.3-J:-i+i if lis even. 

Proof. This follows from applying Lemma [3^1 to each term in the infinite sum, and then summing 
a geometric series: 

oo oo ^k+l-\A\-i n\(i-2k-l)/2\ j_nk-\A\-e ri\U-2k)/2\ 

Y:Gj{A;k)2-' < ^2'^+i-l^|-;.3L0--2'^-i)/2J = ? _ + ^ —. 

The last expression above equals to the upper bound given in the lemma. □ 
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Lemma [3.21 is sufficient in providing an upper bound to (HJ. However, the bound turns out 
to be somewhat weak. That is, in theory we already have the tools to evaluate the limit in (|6]) to 
arbitrary precision, but we would like an more efficient way of upper bounding the trailing error 
terms jMl) . This issue is handled by the following lemma. 

Lemma 3.3. Let k < h < j and A C [0, fc]. Let Bh{A;k) denote the set of all B C [0,h] satisfying 
Bn[0,k]= A and [k + l,h] (ZB + B. Then 

Gj{A;k) = G0,h). 

Proof. The lemma follows from taking the cardinality of 

{T C [0,7] : T n [0,fc] = A, [k + 1,7 - 1] C T + T,; ^ T + T} 

= l+J {T <Z[0,j]:Tn[0,h]=B,[h + l,j-l]<ZT + T,i ^T + T}. □ 

BGBh{A;k) 

We will use Lemma 1331 in way that allows h to vary with k. Let h^- be a computational param- 
eter, one for each k. 

Our method of computing the upper bound to fMl) combines Lemmas 13. 1[|3.2[ and l3.3[ In other 
words, let Gy(A; k) denote the upper bound to Gj{A; k) given in Lemma l3Jl and denote the upper 
bound in Lemma l3^ by 



GTe{A;k) 



2k+2-\A\-e . ^-k+i±i if £ is odd, 
5.2^+2-|A|-£.3-ic-i+| if £ is even. 



Then we have 

YGj{B;h)2-' <GT^,,{2h+i,j+i}iA;k)+ ^ Gj{B;h)l-K 

j>J ]<j<2h 

Then 

YGj{A;k)2-'= Y Y.Gi{B,h,)l-^ 

j>] BeBH^{A;k) j>] 

(B;/c)+ Y Gj{B;hk)2-' ] . (18) 

BeB„^{A;k) \ l<j<2h / 



Our lower estimate to cr[A; k) is 



a^{A;k)=a+{A;k)- ^ ( GT^,.{2h,+i,j+i}iA;k) + ^ Gj{B;h,)2-)] . (19) 

BeBH^iA;k) V l<j<2h J 

Then cr^{A;k) < cr{A;k). Note that the computation of cr_[A;k) does not involve computation 
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any terms Gj{A; k) other than the ones used while computing cr^{A;k). However, we do perform a 
complete search to determine each Bh^ {A;k), though this is much faster than computing additional 
Gj{A; k) terms exactly in order to obtain bounds of the same quality. 

The strength of Lemma 1331 lies in that observation that Lemma |3^ only takes into account the 
restriction that the last element is not in the sum set, whereas Lemma 13.31 additionally takes into 
account the restriction that the first few elements after k are in the sum set. 



3.3 Estimating p 

Now that we know how to estimate (t{A; k) for any particular {A; k), we can obtain the estimates 
for p{A,B;k) = a{A;k)a{B;k) (Proposition l2l0l) by 

p_ (A, B; k) = (A; /c)(7_ (B; k), p+ {A, B; k) = a+ (A; k)a+ (B; k), 

where the formulas for cr_|_ and are found in ((T3)l and ((T9]l respectively. Then, using ((TT) . Propo- 
sition |2]l4l and Lemma [2. 151 we can obtain the following estimates for p: 



P-{A,B;k)<p< 

{A,B;k) 
k<k 



L P+{AB;k) 

(A,B;k) 
k<k 



(3/4)^/2 /3 

1 _ \/3 V4 



(20) 



where the sum is taken over all minimal MSTD fringe pairs {A,B;k) with k < k. 
This completes the description of the algorithm used to estimate p. 

3.4 Numerical results and comments 

The program was written in Java. All source code are available online at 



http://web.mit . edu/yuf eiz/www/mstd_density_code .zip 



All calculations were performed using exact rational arithmetic. We ran the computation with the 
following parameters: 



k = 20, ] = 37, hk 



30, 



ifk< 10, 



k + W, ifk> 10. 



The entire computation took a combine processing time of approximately one week on a single 
2.8 GHz processor. The results of the computation are shown in Table [TJ 
Using ([201) and the data in Table [T] we obtain 



p > 4.286 X 10" 
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Table 1: Results of the computation. The column |{(*,*;fc)}| contains the number of mini- 
mal MSTD fringe pairs of order k. The column J^P-{*, *',k) contains the sum of lower bounds 
p^{A,B}k) over all minimal MSTD fringe pairs {A,B) of a fixed order k, and similarly with the 
column *; A;). 



k 


|{(*,*;fc)}| 






£p+(*,*;lc) 




o 


Q 
O 


V.yZyDy X iU 


-4 


U.yooDc) X iU 


-4 


/ 


1 n 


n 1 Q/i vc^ \/ 1 


-4 


U.iyooU X iU 


-4 


Q 

o 


04 


U.DOoUi X iU 


-4 


U.D74ii X iU 


-4 


9 


106 


0.30178 X 10" 


-4 


0.30468 X 10- 


-4 


10 


396 


0.41411 X 10" 


-4 


0.41840 X 10- 


-4 


11 


1034 


0.34795 X 10- 


-4 


0.35339 X 10- 


-4 


12 


3120 


0.29209 X 10" 


-4 


0.29707 X 10- 


-4 


13 


8316 


0.24097 X 10" 


-4 


0.24529 X 10- 


-4 


14 


26390 


0.21456 X 10- 


-4 


0.21867 X 10- 


-4 


15 


71594 


0.18176 X 10" 


-4 


0.18538 X 10- 


-4 


16 


211356 


0.13581 X 10- 


-4 


0.13878 X 10- 


-4 


17 


612824 


0.12414 X 10- 


-4 


0.12701 X 10- 


-4 


18 


1746622 


0.08570 X 10- 


-4 


0.08792 X 10- 


-4 


19 


5331566 


0.08035 X 10- 


-4 


0.08280 X 10- 


-4 


20 


14747652 


0.05438 X 10- 


-4 


0.05624 X 10- 


-4 


E 




4.28602 X 10- 


-4 


4.34262 X 10- 


-4 



Unfortunately the upper bound that we obtain is rather disappointing, since the error term in the 
upper estimate in (|20]l decreases very slowly with k: 



p < 4.343 X 10-4 + ^Y^^ + 2 (^^Y < 0.43. 



From Monte-Carlo experiments, we know that p should be around 4.5 x 10^^, so we see that 
the weakness in our estimates is in the upper error term as opposed to the sum itself. If we increase 
k, then we should be able to get a better lower bound, but the upper bound would still be far off. 
The rightmost column sum in Tabled] represents an upper bound to the best possible lower bound 
to p that we could obtain without increasing k. Unfortunately, each increment in k would increase 
the total computation time by a factor of about four (mostly to due to the search for minimal MSTD 
fringe pairs). Most of our computation time is spent on complete searches through all subsets of a 
set (in computing the fringe pairs, Gj{A}k), and Bji{A}k)), so perhaps it is worthwhile to come up 
with more efficient search algorithms. 
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4 Extensions to other sum-difference characterizations 



We have just studied the probability that a random subset S C [0, n] is an MSTD set. What if we 
ask finer questions, such as what is the probability that |S + S| — |S — S| = x, where x is some 
fixed integer? Or what is the probability that S is missing exactly s sums and d differences? It 
turns out that our methods can easily be adapted to deal with all these questions. 
Recall from the introduction that 

A(S) = (2n + 1 - |S + S| ,2n + 1 - |S - S|) 

is the pair consisting of the number of missing sums and the number of missing differences. Fix a 
subset ACZ>oxZ>o. We are interested in the collection 

{S C [0,n] : A(S) G A}. 

Let be the probability that a uniform random subset S C [0, n] falls into this collection. In this 
section we prove Theorem 11.31 showing that approaches a limit as n — ?► oo. By choosing 
A = {{s,d) :s<d}we get the MSTD problem. 

Most of the main ideas for the MSTD case carry over to the general case, so we just sketch the 
modifications. As with the MSTD problem, we also have a deterministic algorithm for computing 
arbitrarily good bounds for each limit, though we will not discuss in too much detail the compu- 
tational aspect as it is similar to Section |3l However, even in the case A = {(s, d) : s < d}, the 
general algorithm to be described is much slower than the more specialized algorithm for MSTD 
sets given earlier. Unlike in Section |3l we do not actually carry out the computations, so we make 
no effort in optimization. 

The main difference between the solution of the MSTD case presented earlier and the solution 
to the general case is that we need to consider a more restrictive analogue of rich sets. 

Definition 4.1. Let k and n be positive integers with 2k < n. Let S be a subset of [0, n] . We say that 

S is k-affluent if [A; + l,2n - /: - 1] C S + S and [-2n + k + l,2n -k-1] <Z S - S. 

Whereas rich sets have all the middle sums present, affluent sets additionally have all the 
middle differences present. 

4.1 Affluent sets with given fringe pair 

In this section we consider the probability that a random S C [0, n] has a particular fringe profile 
and is also affluent. The ideas here are very similar to the ones in Sections 12.11 and [ 2.21 The main 
difference is that we no longer have the analogue of semi-rich sets since the constraint of being 
affluent cannot be easily divided into two nearly independent halves. 
We need a more general notion fringe pairs to work with affluent sets. 
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Definition 4.2. A fringe pair of order kis a pair of subsets {A, B) of [0, k] (also denoted {A, B; k)). 
We impose the following partial order on fringe pairs: [A, B; k) > [A', B'; k') iik> k' and 



A' = Ar\[Q,k'], B' = Ar\[Q,k'], [k' + \,k](Z A + A,B + B,A + B. 

Note that unlike MSTD fringe pairs, we do not require G A or G B here. We previously 
imposed this requirement as a computational optimization. 

We say that a fc-affluent subset S C [0, n] has fringe pair {A,B;k) (note that it's the same k) if 
S n \0,k] = A and (n - S) n [0,k] = B. 

The partial order for fringe pairs is stronger than the version used to study MSTD sets. As 
with MSTD fringe pairs, we can speak of minimal fringe pairs, as well as the minimal fringe pair of 
an affluent set. We will count affluent sets by minimal fringe pairs in the same way as we counted 
rich MSTD sets by minimal MSTD fringe pairs. 

Let ( A, B; A:) be a fringe pair and let n > Ik. Let 

^n{A,B}k) = 2""-^ |{S C [0,n] : S n [0,k] =A,{n-S)n [0,k] = B, and S is fc-affluent} | . 

Then }i„{A,B;k) is the probability that a uniformly random S C [0, n] (no longer imposing that 
0, n G S) is fc-affluent with fringe pair (A, B;k). Let 

u(A,B}k) = lim u„(A,B;k). 

The following proposition shows that the limit exists. The result is the analogue of Propositions 
:9landl2l0l 



Proposition 4.3. For every A, B C [0, fc], the limit pi{A,B;k) = lim„^oo /^^(A, B;fc) exists. 

Proof. Assume throughout that n > 2k and S is a uniform random subset of [0, n] . We say that S is 
k-quasi-affluent if 



[k + l,2n-k-\] \ 





n 




-2- 



and [-2n + A; + 1, 2n - - 1] \ 



,2n 



n 




-2- 




n 




-2- 





c S + S, 
c S-S. 



Let }i'„{A,B;k) denote the probability that S is fc-quasi-affluent with fringe pair {A,B;k). If S 
is /c-quasi-affluent but not fc-affluent, then it is necessarily missing some middle sum or middle 
difference, so we can use Lemma [2.121 or an argument analogous to the proof of Proposition 12. 101 
to see that this probability goes to zero as n — > oo. In other words. 



lim MA, B;k) - fini A, B;k)) = 0. 
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Thus it suffices to evaluate lim„^oo fi'n S; k) . Let m = [|J — 1, 

L = Sn[0,m], and R = {n - S) n[0,m]. 
Then the condition that S is /c-quasi-affluent with fringe pair {A, B; k) is equivalent to 

Ln[0,k]=A, Rn[0,k] = B, [k + l,m] <Z L + L,R + R,L + R. (21) 

So the number of pairs of subsets (L, R) of [0,m] satisfying (|2T1) equals to }i'„{A, B;A:)2^('"+^^. 

As in the arguments in Section l2!2l we can compute ji'„ {A, B; A;)2^('"+^^ by considering the com- 
plement to the set of pairs {L, R) satisfying (|2T] )- The complement can be partitioned by the small- 
est element greater than k missing from any of L + L, R + R, L + R. For j > k, let Nj{A, B; k) denote 
the number of pairs ( U, V) of [0, /] such that 

Un[0,k]=A, Vn[0,k]=B, [k + l,i-l]QL + L,R + R,L + R, 
and at least one of L + L, R + i^, L + R is missing /. 

Then 

II! 

/„(A,B;fc)22("'+i) =22('^'-'^)- ^ Nj{A,B;k)2^^"'-i\ 

j=k+l 

hence 

[n/2\-l 

^'„{A,B■,k)=l-^^- Nj{A,B;k)2-^j. (22) 

j^k+l 

Since the quantities }i'n{A, B; k) and Nj [A, B; k) are all nonnegative, letting n — > oo shows that the 
limit 

00 

u{A,B;k) = lim ^'„{A,B}k) =2-^'' - V Nj{A,B}k)2-^' (23) 

]=k+l 

exists. □ 

Each ji{A,B;k) can be computed up arbitrary precision using ((23)) . Indeed, any individual 
term Nj {A, B; k) can be computed explicitly using a complete search. The tail sum can be bounded 
using methods analogous to the ones in Section l3!2l 

4.2 Almost all sets are affluent 

Let A C Z>o X Z>o and 

p;^ = 2-"-i|{SC[0,n]:A(S)eA}|. 
For a fringe pair {A,B;k), define 

A(A, B; k) = {2{k + 1) - |(A + A) n [0, fc] I - |(B + B) n [a fc] 1 , 2(/c + 1 - |( A + B) n [0, fc] I) ) . 
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It is easy to see that if S is fc-affluent with fringe pair {A,B;k) then A(S) = \{A,B;k). The following 
result is the generalization of Propositions 12 . 14l and Proposition l2.16[ 

Proposition 4.4. As n —> oo, converges to a limit p^, and 

J2 F{AB;k) (24) 

A{A,B;k)GA 

where the sum is taken over all minimal fringe -pairs [A, B;k) satisfying A( A, B; A:) G A. 
Proof. An argument similar to the proof of Proposition |2. 141 shows that 

X] MAB;k)<pt„ 

A{A,B;k)GA 
k<k 

< X: Mn(^B;fc) + :^^^^ + 8^ +(" + !) I • (25) 

A{A,B;k)GA 2-V3 V*/ V*/ 

k<k 

The error term on the upper bound uses Lemma [2.131 Letting n ^ oo, and then A: — ?► oo shows that 
the limit p^ = p^ exists and is equal to the expression in (|24)l . □ 

If we want to compute lower and upper bounds for p^, we just let n — > oo in l|25)) to get 

j: ^{A,B;k)<p^< vi^'m + ^-^^^+^i-T' ■ 

\{A,B;k)eA A{A,B;k)eA ^ V3 V±/ 

k<l k<k 

Proof of Theorem [Ol The theorem follows almost immediately from Proposition 14.41 The first as- 
sertion is a direct consequence of Proposition 14.41 The second assertion that > as long as A 
contains some element {s,d) with d even follows from [1^ Thm. 8]. For the final assertion, since 
li{A,B;k) > 0, the sum in ((24)) can be partitioned by A(A, B; fc) to obtain that 

p^= E H{A,B;k)= X: ( E Hi^^m]= E P'''- ° 

A{A,B;k)GA {s,d)GA \A{A,B;k) = {s,d) / (s,i)GA 



Proposition 14.41 can be used to compute estimates for p^ similar to the MSTD case. The only 
step that we are missing is bounding the Nj{A, B;k) terms. We omit this discussion since it is very 
similar to bounding Gj{A; k) as we did in Section [3!2l 

5 Structure of a random set characterized by A 

Let A C Z>o X Z>o contain at least one element (s, d) with d even. So p^ > 0. In this section, 
we study the structure of a random subset S C [0, n] conditioned on A(S) G A. Our main result. 
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stated below, says that the middle segment of S is nearly unrestricted and independent from the 
fringe choice. Theorem 15.11 formalizes the intuition that the fringe of an MSTD set matters a lot 
while other elements matter very little. 

Theorem 5.1. Let A C Z>o x Z>o ivhere A contains at least one element (s, d) with d even. Suppose we 
have an integer sequence a„ satisfying < cc„ < n/2 and a„ —> oo as n —> oo. Let e > 0, then for all 
sufficiently large n the following is true: 

Let S be a uniform random subset of [0, n], E an event that depends only on Sn [a„ + 1, n — a„ — 1], 
and F an event that depends only on Sn { [0, ci„] U [n — a„ — 1] ). Then 

|P(£nF I A(S) e A) -P(E)P(F I A(S) e A)| < (1 + e) ^ 



A 



(2 - V3)p 

Note that the bound approaches zero as n oo. Intuitively, this says that the structure of the 
middle portion of a random MSTD set is close to that of an unrestricted set. 

Corollary 5.2. Let A and a. satisfy the hypotheses of Theorem 15.11 For each n, let S„ be a uniform 
random subset of [0,n] and £„ an event that depends only on Sn 11 [cc„ -\- l,n — a.„ — 1]. Suppose that 
lim„^oo P(Ef!) exists. Then 

lim P(E„ I A„(S„) e A) = lim P(£„). 

Proof. In Theorem 15. II let F be the event that includes all outcomes. □ 

In this section we prove Theorem 15.11 and give some applications. The proofs are mostly in- 
dependent of the results in previous sections. Even though we assume the existence of the limit 
p^, it suffices to know that has a positive lower Umit. We also use the notion of affluent sets, 
defined in the beginning of Section HI 

5.1 Proof of Theorem 15.11 

We would like to slightly perturb the event on which we are conditioning. The following lemma 
shows that this modification does not change the probability very much. 

Lemma 5.3. Let A, B, E be three events such that A C B and P(A) > 0. Then 

2P(B\ A) 



|P(E I A)-F{E I B)\ < 



P(B) 



Proof. We have 



|P(E I A) -P(E I B)| 

P(En A) _ P(£nB) 
P(A) P(B) 
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_ |P(E n A)P(B) - P(E n B)F{A)\ 
~ P(A)P(B) 

_ |P(En A)P(B) -P(£n A)P(A) +P(En A)P(A) -P(EnB)P(A)| 
~ P(A)P(B) 

P(En A) |P(B) -P(A)| + |P(En A) -P(EnB)|P(A) 

- P(A)P(B) 

_ P(EnA)P(B\A)+P(En(B\A))P(A) 
~ P(A)P(B) 
P(A)P(B \ A) + P(B \ A)P(A) 

- P(A)P(B) 
2P(B\A) 

- P(B) ' 

as desired. □ 

We would like to slightly perturb the event being conditioned so that it becomes independent 
of the middle segment of S. We do so by adding and removing some non-affluent sets into the 
event. This is the idea behind the following proposition which leads directly to the theorem. 

Proposition 5.4. Let A C Z>o x Z>o, 2k < n be positive integers, and S a uniform random subset 
of [0,n]. Assume that P(A(S) G A and S is k-affluent) > 0. Let E be an event that depends only on 
Sr\[k + l,n — k — l\, and F an event that depends only on Sil {[0,k] U [n — k,n\). Then 

|P(E n F I A(S) e A) - P(E)P(f I A(S) e A) I < ' 



W[\{S) e AandS is k-affluent) ' 
Proof. Consider the following events: 

A = {A(S) GA}, 

B = {A(S) G A and S is A:-affluent}, 

C = {3T C [0,n], A(T) G A, T is fc-affluent, 

S^[0,k\ =T^[0,k],S^[n-k,n\ = T r^[n - k,n\}, 
D = {S is not fc-affluent}. 

It is easy to see that B C A and B C C. Furthermore, A \ B C D and C \ B C D, the latter follows 
from the observation that if C occurs and S is A;-affluent then S + S = T + T and S — S = T — T, so 
that A(S) = A(F) G A and hence B occurs as well. 
Applying Lemma 1531 we have 

|P(EnF|.)-P(EnK|B)|<i^,?iM, 
,P,.n.,e)-P(Hn.,c„s^.^. 
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So combining the two inequalities gives us 



|P(E n F I A) - P(E n F I C) I < (26) 



Similarly, we have 



|P(E)F(F I A) - P(£)P{F I C)| < ilim£l < (27) 

Now, E depends only onS r\[k + l,n — k — 1], while F and C depend only on S n ( [0, k] U [n — k, n]) . 
So E is independent from F n C. Thus P(E n F | C) = P(E)P(F | C). Then combining ((2611 and 
(|27l) gives us 

|P(£nF I A) -P(E)(F I A)| < |P(EnF | A) -P(EnF | C)| + |P(E)(F | A) -P(E)P(F | C)| 

8P(D) 
- P(B) ' 

as desired. □ 

Proof of Theorem |5J] Let S„ denote a uniform random subset of [0, n]. Using Proposition I5.4[ it 
suffices to show that 

8P(S„isnota„-affluent)(3/4)-'*«/2 24 

lim sup ^ 

„^oo P(A(S„) G A and S„ is a„-affluent) " {2-^/3)p^' 

By Lemma [2.131 we have 

3|'3/4^«»/2 /a\a„+2 /r.N(n-l)/3 

P(Sn is not ^n-affluent) < ^ ^ ^ ^ U/ ^^"^-^^(ij 
so that 

limsupP(S„ isnota„-affluent)(3/4)"''"/2 < — 

n-)-oo 2 — V 3 

By ((28)) and Theorem [L3] we have 

lim P(A(S„) e A and S„ is a„-affluent) = lim P(A(S„) e A) = p"^. 
The theorem then follows. □ 

5.2 Applications 

In this section we explore some applications of Theorem 15. II 

Miller, Orosz, and Scheinerman ||61 conjectured that, for a fixed constant < c < 1 /2, and k„ 



(28) 
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varying with n satisfying cn < kn < n — cn, we have 

I {S C [0, n] : fc„ g S and S is MSTD} | _ 1 
|{S C [0,n] : Sis MSTD} I ~ 2' 

It was also asked if we could replace the condition cn < kn < n — cnhy otn <^n < n — ocn for some 
function <x. The following result answers these questions. Recall that taking A = {[s,d) : s < d} 
gives us MSTD sets. 

Corollary 5.5. Let A and a satisfy the hypotheses ofTheorem \5.1\ For each n, let S„ be a uniform random 
subset of [0, n]. Ifk„ is a sequence satisfying a„ < k„ < n — ocn, then 

lim ¥{kn e Sn I A(S„) e A) = i. 

Proof. In Corollary 15.21 let E„ be the event {k„ G S„}. □ 

Now we give some results about the size of a random subset S C [0, n] satisfying A(S) G A. 
Because fringe elements do not contribute significantly to |S|, our intuition tells us that the size 
of the set should behave similar to an unrestricted binomial distribution. The next two results 
confirm this intuition. In the variance part of the next Proposition, we actually need to set the 
fringe event F in Theorem 15.11 to be something nontrivial, thereby using the full power of the 
theorem. 

Proposition 5.6. Let A C Z>o x Z>o contain at least one {s,d) with d even. For each n, let S,, be a 
uniform random subset of [0, n] . Then 

E[\Sn\ I A(S„) G A] = ^ + O(logn) (29) 

and 

Var(|S„| I A{S„) G A) = ^ + 0((logn)2) (30) 
where the constants in the big-O may depend on A. 

Proof. Choose a.„ = [clognj for some constant c > j^^^jy^- Let Sj^ = S„ fl [a.„ + 1, n — a„ — 1]. 
Applying Theorem l5.1l to the events E = {kn ^ S„} and F the event of all outcomes, we get 



E[|S^| I A(S„) G A] 



n — 1 — Idn 



< E 

/:=a„+l 



P(/cGS„ I A(S„) GA)-i 



O (n{?,/AY"^^ 
' 0, as n — ?► oo. 
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Thus 



E[|S„| I A(S„) e A] =E[|SfJ \A{S„) EA]+E[\Sn\S^„\ | A(S„) e A] 

n — 1 — 2cc,i 



- + o(l)+OK) 



n + 1 



+ 0(logn). 



This proves ((29)) . 

Next, for the variance, we have 

Var(|S„| I A(S„) G A) = E [(|S„| - E[|S„|])' | A(S) G A 

,2 



E 
E 

E 



S„|-i^ + 0(logn)) |A(S)GA 
S„|-i^)'|A(S)GA 
+ 0(logn)E [{\S„\ - ^) I A(S) G A] +0((logn)2) 



S„|-ii±l)^ I A(S) G A +0((logn)2). 



For each i G [0, n], let X, be the indicator random variable which is 1 if / G S and otherwise. Then 



E 



|S„|-2+i)'|A(S)GA 



E 



A(S) G A 



EEE[(X,-i) (X,- - 1) I A(S) G A] . 
/=0 7=0 



(31) 



Next we analyze each term E [(X,- - i) (Xy - i) | A(S) G A] using Theorem 15.11 There are 
several cases to consider. 

Suppose that G [a,, + l,n — a.„ — 1]. For any event E that depends on S n {/,;}, we have 



|P(£ I A(S) G A)-P(£)| =0((3/4) 



a„/2^ 



Thus, 



an/2\ 



E [(X, - 1) (Xy - 1) I A(S) G A] = E [(X, - 1) (X,- - 1)] + 0((3/4) 

= 0((3/4r"/2) + |^' 

b, if/^;. 



Next, suppose that i G [a„ + 1, n — a„ — 1] and ; ^ [a,, + 1, n — a„ — 1] (or vice-versa). If event 
E is either {z G S} or {/ ^ S} and event F is either G S} or {/ ^ S}, then 



|P(Enf I A(S) G A) -P(E)P(f I A(S) G A)| = 0((3/4)'*"/2). 
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Also 



E [(X,- - 1) (X^- - 1) I A(S) G A] = E [X, - 1] E [X^^ - \ \ A(S) G A] + 0((3/4)'^"/2) 

Finally, if i, j ^ [a„ + 1, n — a„ — 1] then we simply use the crude approximation 

-1<E[(X,-1) (X,. - 1) I A(S) G A] < 1. 
Combining all three cases and continuing ((31) we get 

Var(|S„| I A(S„) G A) =E [(|S„|-^)'| A(S) G a] +0((logn)2) 

= E E E [(X, - i) (X,' - i) I A(S) G A] + 0((logn)2) 



(=07=0 

n + 1 

4 " 
n + 1 



0(n^(3/4)«"/^) + OK) + 0((logn)^) 
0((logn)2). 



□ 



The next result shows that the size of S follows a central limit theorem. 

Proposition 5.7. Let A C Z>o x Z>o contain at least one {s,d) with d even. For each n, let S„ be a 
uniform random subset of [0, n]. Then, for any real number t, we have 



lim P |S„| < 



A(S„) G A =0(f) 



n + t^Jn 
2 

where <E>(f) is the standard normal distribution. 

Proof. Choose any a„ = o{y/n) with a„ — > oo. Let denote S„ n [a„ + 1, n — a„ — 1]. We have 



n + f i/n 



2a„ - 2 



A(S„) G A) <P(^|S,| < 

<pf|s«|< 



2 



A{Sn) G A 
A(S„) G A 



Using Corollary I5.2l and the Central Limit Theorem, we find that 



lim P \S^„\ < 



n + fi/n 



A{Sn) G a") = lim P f |S^| < miy^) = o(f). 



Similarly, 



limP(|S;^|<^^-ti^-2«„-2 



A(S„) G A =0(f). 



29 



Therefore 



P \S„\ < 



n + ty/n 
2 



A(S„) G A 



) 



0(0. 



□ 



6 Conclusion and discussion 

This paper explores the intuition about the structure of a random MSTD set, namely that its fringe 
elements are significant while its middle elements are not. Consequently, we can compute the 
proportion of MSTD sets by searching through all desirable fringe pairs and then sum up the 
contributions from each fringe pair. We were also able to make some precise statements about 
how the middle elements are nearly unrestricted and independent from the fringe elements. 

More generally, our results apply to any characterization A on the number of missing sums 
and the number of missing differences of S C {0, 1, . . . , n}. Our methods can also be modified to 
deal with the following two extensions, though we choose not to discuss them in order to keep 
the arguments simple. 

• Our paper is based on the model where each element of {0,1, ... ,n} is chosen indepen- 
dently with probability 1/2. Our results can be modified to deal with the model where the 
probability is some other constant (independent of n). 

• We can place additional constraints on the fringe of S. For example, in addition to requiring 
A(S) G A, we may further require that 0, 1, n G S and 4, n — 1 ^ S. This amounts to including 
or excluding a certain subset of prefix-suffix pairs. 

Our method currently does not easily extend to the model where the each element is chosen 
with probability p(n) varying with n. For results in this direction, Hegarty and Miller [2J showed 
that if p(n) — )■ and n^^ = o{p{n)), then a random subset almost always has more sums than 
differences. It would be interesting to see if there are any analogues of Theorem 15.11 other than in 
the uniform model with constant probability. 

We showed that each limit can be computed deterministically up to arbitrary precision. 
However, in practice, the convergence is very slow since each term requires a complete search. 
Also error bounds such as Lemma [2. 121 are too weak to give good numerical results. In the MSTD 
case we were able to substantially speed up the computation by splitting a rich set into two semi- 
rich sets and then analyzing each half separately. Unfortunately, in the general case, there does 
not seem to be a good way to split up an affluent set. Consequently, we expect the computation in 
the general case to be much slower. 

It would nice to find some optimization that could substantially speed up the algorithm. For 
instance, perhaps we do not have to perform so many complete searches, or perhaps there is some 
way to divide an affluent set into nearly independent parts. It would also be nice to have a tigher 
upper bound than what is provided by Lemma [2.121 

In practice, if we wish to estimate any p^, the easiest and quickest way would be to run a Monte 
Carlo simulation. However, this has the disadvantage of not being able to give any provable 
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bounds. 

We conclude with some possible further questions. 

1. For each fringe pair [A,B;k), can we give an explicit construction of a family of rich/ affluent 
sets that occupy 0(1) density? 

2. What can we say if we choose to characterize Sby (|S + S|,|S — S|) instead of the number 
of missing sums and differences? In this case, which subsets of Z>o x Z>o give interesting 
results? 

3. How quickly does converge to p^? Our proofs do not say anything about this. The 
convergence mentioned in this paper is the convergence of the computed numerical bound, 
which depends on the order k of the fringe pairs as opposed to n. 

4. For which A is the sequence {p^ } monotonic? Martin and O'Bryant [Si suggest perhaps it 
is monotonically increasing for {(s, d) : s < d} and {(s, d) : s > d}, while monotonically 
decreasing for {{s,d) : s = d}. Is the sequence {Pn} always eventually monotonic? When 
does it approach the limit from above and when does it approach the limit from below? 

5. Can we improve the error term in Proposition 15.61 for the expectation and variance of |S|? 
For which A is the error term asymptotically tight? 
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